Overview

Dataset statistics

Number of variables29
Number of observations2129381
Missing cells18234791
Missing cells (%)29.5%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory471.1 MiB
Average record size in memory232.0 B

Variable types

DateTime2
Categorical6
Unsupported1
Numeric8
Text12

Alerts

NUMBER OF PEDESTRIANS KILLED is highly imbalanced (99.6%)Imbalance
NUMBER OF CYCLIST INJURED is highly imbalanced (92.1%)Imbalance
NUMBER OF CYCLIST KILLED is highly imbalanced (99.9%)Imbalance
CONTRIBUTING FACTOR VEHICLE 4 is highly imbalanced (90.8%)Imbalance
CONTRIBUTING FACTOR VEHICLE 5 is highly imbalanced (90.1%)Imbalance
BOROUGH has 662185 (31.1%) missing valuesMissing
ZIP CODE has 662446 (31.1%) missing valuesMissing
LATITUDE has 239254 (11.2%) missing valuesMissing
LONGITUDE has 239254 (11.2%) missing valuesMissing
LOCATION has 239254 (11.2%) missing valuesMissing
ON STREET NAME has 456109 (21.4%) missing valuesMissing
CROSS STREET NAME has 811789 (38.1%) missing valuesMissing
OFF STREET NAME has 1765645 (82.9%) missing valuesMissing
CONTRIBUTING FACTOR VEHICLE 2 has 333967 (15.7%) missing valuesMissing
CONTRIBUTING FACTOR VEHICLE 3 has 1976273 (92.8%) missing valuesMissing
CONTRIBUTING FACTOR VEHICLE 4 has 2094603 (98.4%) missing valuesMissing
CONTRIBUTING FACTOR VEHICLE 5 has 2119915 (99.6%) missing valuesMissing
VEHICLE TYPE CODE 2 has 414316 (19.5%) missing valuesMissing
VEHICLE TYPE CODE 3 has 1982003 (93.1%) missing valuesMissing
VEHICLE TYPE CODE 4 has 2095841 (98.4%) missing valuesMissing
VEHICLE TYPE CODE 5 has 2120207 (99.6%) missing valuesMissing
LATITUDE is highly skewed (γ1 = -20.4084487)Skewed
NUMBER OF PERSONS KILLED is highly skewed (γ1 = 33.3170398)Skewed
NUMBER OF MOTORIST KILLED is highly skewed (γ1 = 53.62264977)Skewed
COLLISION_ID has unique valuesUnique
ZIP CODE is an unsupported type, check if it needs cleaning or further analysisUnsupported
NUMBER OF PERSONS INJURED has 1631233 (76.6%) zerosZeros
NUMBER OF PERSONS KILLED has 2126219 (99.9%) zerosZeros
NUMBER OF PEDESTRIANS INJURED has 2012019 (94.5%) zerosZeros
NUMBER OF MOTORIST INJURED has 1812117 (85.1%) zerosZeros
NUMBER OF MOTORIST KILLED has 2128134 (99.9%) zerosZeros

Reproduction

Analysis started2024-10-29 14:03:49.079553
Analysis finished2024-10-29 14:05:00.958893
Duration1 minute and 11.88 seconds
Software versionydata-profiling v0.0.dev0
Download configurationconfig.json

Variables

Distinct4497
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size16.2 MiB
Minimum2012-07-01 00:00:00
Maximum2024-10-22 00:00:00
2024-10-29T15:05:01.006682image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:05:01.081730image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Distinct1440
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size16.2 MiB
Minimum2024-10-29 00:00:00
Maximum2024-10-29 23:59:00
2024-10-29T15:05:01.155624image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:05:01.223622image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

BOROUGH
Categorical

MISSING 

Distinct5
Distinct (%)< 0.1%
Missing662185
Missing (%)31.1%
Memory size16.2 MiB
BROOKLYN
467766 
QUEENS
393566 
MANHATTAN
327164 
BRONX
217181 
STATEN ISLAND
61519 

Length

Max length13
Median length9
Mean length7.4520732
Min length5

Characters and Unicode

Total characters10933652
Distinct characters19
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowBROOKLYN
2nd rowBROOKLYN
3rd rowBRONX
4th rowBROOKLYN
5th rowMANHATTAN

Common Values

ValueCountFrequency (%)
BROOKLYN 467766
22.0%
QUEENS 393566
18.5%
MANHATTAN 327164
15.4%
BRONX 217181
 
10.2%
STATEN ISLAND 61519
 
2.9%
(Missing) 662185
31.1%

Length

2024-10-29T15:05:01.293821image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-10-29T15:05:01.357931image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
brooklyn 467766
30.6%
queens 393566
25.7%
manhattan 327164
21.4%
bronx 217181
14.2%
staten 61519
 
4.0%
island 61519
 
4.0%

Most occurring characters

ValueCountFrequency (%)
N 1855879
17.0%
O 1152713
10.5%
A 1104530
10.1%
E 848651
 
7.8%
T 777366
 
7.1%
R 684947
 
6.3%
B 684947
 
6.3%
L 529285
 
4.8%
S 516604
 
4.7%
Y 467766
 
4.3%
Other values (9) 2310964
21.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 10872133
99.4%
Space Separator 61519
 
0.6%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
N 1855879
17.1%
O 1152713
10.6%
A 1104530
10.2%
E 848651
 
7.8%
T 777366
 
7.2%
R 684947
 
6.3%
B 684947
 
6.3%
L 529285
 
4.9%
S 516604
 
4.8%
Y 467766
 
4.3%
Other values (8) 2249445
20.7%
Space Separator
ValueCountFrequency (%)
61519
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 10872133
99.4%
Common 61519
 
0.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
N 1855879
17.1%
O 1152713
10.6%
A 1104530
10.2%
E 848651
 
7.8%
T 777366
 
7.2%
R 684947
 
6.3%
B 684947
 
6.3%
L 529285
 
4.9%
S 516604
 
4.8%
Y 467766
 
4.3%
Other values (8) 2249445
20.7%
Common
ValueCountFrequency (%)
61519
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 10933652
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
N 1855879
17.0%
O 1152713
10.5%
A 1104530
10.1%
E 848651
 
7.8%
T 777366
 
7.1%
R 684947
 
6.3%
B 684947
 
6.3%
L 529285
 
4.8%
S 516604
 
4.7%
Y 467766
 
4.3%
Other values (9) 2310964
21.1%

ZIP CODE
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing662446
Missing (%)31.1%
Memory size16.2 MiB

LATITUDE
Real number (ℝ)

MISSING  SKEWED 

Distinct127320
Distinct (%)6.7%
Missing239254
Missing (%)11.2%
Infinite0
Infinite (%)0.0%
Mean40.627408
Minimum0
Maximum43.344444
Zeros4484
Zeros (%)0.2%
Negative0
Negative (%)0.0%
Memory size16.2 MiB
2024-10-29T15:05:01.429482image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile40.59657
Q140.66762
median40.720615
Q340.769623
95-th percentile40.861942
Maximum43.344444
Range43.344444
Interquartile range (IQR)0.102003

Descriptive statistics

Standard deviation1.9827696
Coefficient of variation (CV)0.048803742
Kurtosis415.18085
Mean40.627408
Median Absolute Deviation (MAD)0.051345
Skewness-20.408449
Sum76790961
Variance3.9313751
MonotonicityNot monotonic
2024-10-29T15:05:01.499539image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 4484
 
0.2%
40.861862 917
 
< 0.1%
40.696033 787
 
< 0.1%
40.8047 692
 
< 0.1%
40.608757 673
 
< 0.1%
40.759308 630
 
< 0.1%
40.798256 627
 
< 0.1%
40.6960346 587
 
< 0.1%
40.675735 582
 
< 0.1%
40.658577 538
 
< 0.1%
Other values (127310) 1879610
88.3%
(Missing) 239254
 
11.2%
ValueCountFrequency (%)
0 4484
0.2%
30.78418 1
 
< 0.1%
34.783634 1
 
< 0.1%
40.498947 1
 
< 0.1%
40.4989488 2
 
< 0.1%
40.4991346 1
 
< 0.1%
40.49931 1
 
< 0.1%
40.4994787 1
 
< 0.1%
40.499659 1
 
< 0.1%
40.499672 1
 
< 0.1%
ValueCountFrequency (%)
43.344444 1
 
< 0.1%
42.64154 1
 
< 0.1%
42.318317 1
 
< 0.1%
42.107204 1
 
< 0.1%
41.91661 1
 
< 0.1%
41.34796 1
 
< 0.1%
41.258785 1
 
< 0.1%
41.12615 5
< 0.1%
41.12421 1
 
< 0.1%
41.061634 2
 
< 0.1%

LONGITUDE
Real number (ℝ)

MISSING 

Distinct98796
Distinct (%)5.2%
Missing239254
Missing (%)11.2%
Infinite0
Infinite (%)0.0%
Mean-73.751569
Minimum-201.35999
Maximum0
Zeros4484
Zeros (%)0.2%
Negative1885643
Negative (%)88.6%
Memory size16.2 MiB
2024-10-29T15:05:01.568973image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-201.35999
5-th percentile-74.036926
Q1-73.974754
median-73.927115
Q3-73.866798
95-th percentile-73.76325
Maximum0
Range201.35999
Interquartile range (IQR)0.1079561

Descriptive statistics

Standard deviation3.723789
Coefficient of variation (CV)-0.05049098
Kurtosis439.19555
Mean-73.751569
Median Absolute Deviation (MAD)0.052583
Skewness16.191619
Sum-1.3939983 × 108
Variance13.866604
MonotonicityNot monotonic
2024-10-29T15:05:01.643309image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 4484
 
0.2%
-73.89063 787
 
< 0.1%
-73.91282 719
 
< 0.1%
-73.98453 702
 
< 0.1%
-73.89686 684
 
< 0.1%
-74.038086 674
 
< 0.1%
-73.91243 654
 
< 0.1%
-73.94476 614
 
< 0.1%
-73.9112 588
 
< 0.1%
-73.9845292 587
 
< 0.1%
Other values (98786) 1879634
88.3%
(Missing) 239254
 
11.2%
ValueCountFrequency (%)
-201.35999 1
 
< 0.1%
-201.23706 105
< 0.1%
-89.13527 1
 
< 0.1%
-86.76847 1
 
< 0.1%
-79.61955 1
 
< 0.1%
-79.00183 1
 
< 0.1%
-76.2634 1
 
< 0.1%
-76.02163 1
 
< 0.1%
-74.742 7
 
< 0.1%
-74.25496 1
 
< 0.1%
ValueCountFrequency (%)
0 4484
0.2%
-32.768513 16
 
< 0.1%
-47.209625 3
 
< 0.1%
-73.66301 1
 
< 0.1%
-73.70055 2
 
< 0.1%
-73.700584 11
 
< 0.1%
-73.7005968 10
 
< 0.1%
-73.70061 5
 
< 0.1%
-73.70071 4
 
< 0.1%
-73.70073 1
 
< 0.1%

LOCATION
Text

MISSING 

Distinct291511
Distinct (%)15.4%
Missing239254
Missing (%)11.2%
Memory size16.2 MiB
2024-10-29T15:05:01.824870image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length25
Median length24
Mean length22.752478
Min length10

Characters and Unicode

Total characters43005073
Distinct characters16
Distinct categories6 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique160873 ?
Unique (%)8.5%

Sample

1st row(40.667202, -73.8665)
2nd row(40.683304, -73.917274)
3rd row(40.709183, -73.956825)
4th row(40.86816, -73.83148)
5th row(40.67172, -73.8971)
ValueCountFrequency (%)
0.0 8968
 
0.2%
40.861862 917
 
< 0.1%
73.89063 787
 
< 0.1%
40.696033 787
 
< 0.1%
73.91282 719
 
< 0.1%
73.98453 702
 
< 0.1%
40.8047 692
 
< 0.1%
73.89686 684
 
< 0.1%
74.038086 674
 
< 0.1%
40.608757 673
 
< 0.1%
Other values (226105) 3764651
99.6%
2024-10-29T15:05:02.091589image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
7 4708827
10.9%
4 4081519
 
9.5%
. 3780254
 
8.8%
3 3584874
 
8.3%
0 3485511
 
8.1%
9 2763135
 
6.4%
8 2713327
 
6.3%
6 2681947
 
6.2%
5 2146280
 
5.0%
( 1890127
 
4.4%
Other values (6) 11169272
26.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 29778668
69.2%
Other Punctuation 5670381
 
13.2%
Open Punctuation 1890127
 
4.4%
Space Separator 1890127
 
4.4%
Close Punctuation 1890127
 
4.4%
Dash Punctuation 1885643
 
4.4%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
7 4708827
15.8%
4 4081519
13.7%
3 3584874
12.0%
0 3485511
11.7%
9 2763135
9.3%
8 2713327
9.1%
6 2681947
9.0%
5 2146280
7.2%
2 1825473
 
6.1%
1 1787775
 
6.0%
Other Punctuation
ValueCountFrequency (%)
. 3780254
66.7%
, 1890127
33.3%
Open Punctuation
ValueCountFrequency (%)
( 1890127
100.0%
Space Separator
ValueCountFrequency (%)
1890127
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1890127
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1885643
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 43005073
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
7 4708827
10.9%
4 4081519
 
9.5%
. 3780254
 
8.8%
3 3584874
 
8.3%
0 3485511
 
8.1%
9 2763135
 
6.4%
8 2713327
 
6.3%
6 2681947
 
6.2%
5 2146280
 
5.0%
( 1890127
 
4.4%
Other values (6) 11169272
26.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 43005073
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
7 4708827
10.9%
4 4081519
 
9.5%
. 3780254
 
8.8%
3 3584874
 
8.3%
0 3485511
 
8.1%
9 2763135
 
6.4%
8 2713327
 
6.3%
6 2681947
 
6.2%
5 2146280
 
5.0%
( 1890127
 
4.4%
Other values (6) 11169272
26.0%

ON STREET NAME
Text

MISSING 

Distinct18753
Distinct (%)1.1%
Missing456109
Missing (%)21.4%
Memory size16.2 MiB
2024-10-29T15:05:02.243022image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length32
Median length32
Mean length29.282575
Min length2

Characters and Unicode

Total characters48997712
Distinct characters75
Distinct categories10 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6708 ?
Unique (%)0.4%

Sample

1st rowWHITESTONE EXPRESSWAY
2nd rowQUEENSBORO BRIDGE UPPER
3rd rowTHROGS NECK BRIDGE
4th rowSARATOGA AVENUE
5th rowMAJOR DEEGAN EXPRESSWAY RAMP
ValueCountFrequency (%)
avenue 621705
 
16.1%
street 532163
 
13.8%
east 156684
 
4.1%
boulevard 129656
 
3.4%
west 117108
 
3.0%
parkway 77106
 
2.0%
road 69573
 
1.8%
expressway 65720
 
1.7%
island 31466
 
0.8%
queens 27881
 
0.7%
Other values (5405) 2032287
52.6%
2024-10-29T15:05:02.465793image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
27613470
56.4%
E 3758695
 
7.7%
A 1998948
 
4.1%
T 1872893
 
3.8%
R 1711607
 
3.5%
N 1462599
 
3.0%
S 1442202
 
2.9%
U 999980
 
2.0%
O 890106
 
1.8%
V 871501
 
1.8%
Other values (65) 6375711
 
13.0%

Most occurring categories

ValueCountFrequency (%)
Space Separator 27613470
56.4%
Uppercase Letter 20053815
40.9%
Decimal Number 1199204
 
2.4%
Lowercase Letter 119354
 
0.2%
Other Punctuation 4899
 
< 0.1%
Open Punctuation 3399
 
< 0.1%
Close Punctuation 3394
 
< 0.1%
Dash Punctuation 175
 
< 0.1%
Control 1
 
< 0.1%
Math Symbol 1
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
E 3758695
18.7%
A 1998948
10.0%
T 1872893
9.3%
R 1711607
 
8.5%
N 1462599
 
7.3%
S 1442202
 
7.2%
U 999980
 
5.0%
O 890106
 
4.4%
V 871501
 
4.3%
L 659048
 
3.3%
Other values (16) 4386236
21.9%
Lowercase Letter
ValueCountFrequency (%)
e 16058
13.5%
r 10563
 
8.9%
a 10008
 
8.4%
n 9996
 
8.4%
t 8752
 
7.3%
s 7328
 
6.1%
o 7030
 
5.9%
y 5758
 
4.8%
l 5497
 
4.6%
d 4642
 
3.9%
Other values (16) 33722
28.3%
Decimal Number
ValueCountFrequency (%)
1 273257
22.8%
3 135504
11.3%
2 133999
11.2%
4 113521
9.5%
5 110971
9.3%
6 97477
 
8.1%
8 90297
 
7.5%
7 88435
 
7.4%
9 79076
 
6.6%
0 76667
 
6.4%
Other Punctuation
ValueCountFrequency (%)
. 3646
74.4%
/ 1114
 
22.7%
& 64
 
1.3%
' 37
 
0.8%
, 16
 
0.3%
# 16
 
0.3%
@ 6
 
0.1%
Space Separator
ValueCountFrequency (%)
27613470
100.0%
Open Punctuation
ValueCountFrequency (%)
( 3399
100.0%
Close Punctuation
ValueCountFrequency (%)
) 3394
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 175
100.0%
Control
ValueCountFrequency (%)
 1
100.0%
Math Symbol
ValueCountFrequency (%)
> 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 28824543
58.8%
Latin 20173169
41.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
E 3758695
18.6%
A 1998948
9.9%
T 1872893
9.3%
R 1711607
 
8.5%
N 1462599
 
7.3%
S 1442202
 
7.1%
U 999980
 
5.0%
O 890106
 
4.4%
V 871501
 
4.3%
L 659048
 
3.3%
Other values (42) 4505590
22.3%
Common
ValueCountFrequency (%)
27613470
95.8%
1 273257
 
0.9%
3 135504
 
0.5%
2 133999
 
0.5%
4 113521
 
0.4%
5 110971
 
0.4%
6 97477
 
0.3%
8 90297
 
0.3%
7 88435
 
0.3%
9 79076
 
0.3%
Other values (13) 88536
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 48997712
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
27613470
56.4%
E 3758695
 
7.7%
A 1998948
 
4.1%
T 1872893
 
3.8%
R 1711607
 
3.5%
N 1462599
 
3.0%
S 1442202
 
2.9%
U 999980
 
2.0%
O 890106
 
1.8%
V 871501
 
1.8%
Other values (65) 6375711
 
13.0%

CROSS STREET NAME
Text

MISSING 

Distinct20402
Distinct (%)1.5%
Missing811789
Missing (%)38.1%
Memory size16.2 MiB
2024-10-29T15:05:02.632956image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length32
Median length31
Mean length22.514565
Min length1

Characters and Unicode

Total characters29665011
Distinct characters76
Distinct categories12 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6283 ?
Unique (%)0.5%

Sample

1st row20 AVENUE
2nd rowDECATUR STREET
3rd rowEAST 43 STREET
4th rowEAST GATE PLAZA
5th rowwest 80 street -west 81 street
ValueCountFrequency (%)
avenue 577097
 
19.8%
street 468354
 
16.1%
east 114342
 
3.9%
west 72198
 
2.5%
boulevard 70318
 
2.4%
road 56727
 
1.9%
place 34602
 
1.2%
parkway 27271
 
0.9%
3 19160
 
0.7%
park 17736
 
0.6%
Other values (5503) 1456274
50.0%
2024-10-29T15:05:02.871647image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
14147683
47.7%
E 2997641
 
10.1%
T 1482217
 
5.0%
A 1450048
 
4.9%
R 1171483
 
3.9%
N 1098144
 
3.7%
S 1008745
 
3.4%
U 793881
 
2.7%
V 724004
 
2.4%
O 591444
 
2.0%
Other values (66) 4199721
 
14.2%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 14361166
48.4%
Space Separator 14147683
47.7%
Decimal Number 1090877
 
3.7%
Lowercase Letter 64924
 
0.2%
Other Punctuation 321
 
< 0.1%
Dash Punctuation 28
 
< 0.1%
Open Punctuation 3
 
< 0.1%
Close Punctuation 3
 
< 0.1%
Control 2
 
< 0.1%
Math Symbol 2
 
< 0.1%
Other values (2) 2
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
E 2997641
20.9%
T 1482217
10.3%
A 1450048
10.1%
R 1171483
 
8.2%
N 1098144
 
7.6%
S 1008745
 
7.0%
U 793881
 
5.5%
V 724004
 
5.0%
O 591444
 
4.1%
L 447421
 
3.1%
Other values (16) 2596138
18.1%
Lowercase Letter
ValueCountFrequency (%)
e 12106
18.6%
t 6733
10.4%
a 6382
9.8%
r 5356
 
8.2%
n 4610
 
7.1%
s 4229
 
6.5%
o 3129
 
4.8%
v 3016
 
4.6%
u 2648
 
4.1%
l 2340
 
3.6%
Other values (16) 14375
22.1%
Decimal Number
ValueCountFrequency (%)
1 242009
22.2%
2 128448
11.8%
3 119975
11.0%
4 98442
9.0%
5 98114
9.0%
8 86578
 
7.9%
7 86410
 
7.9%
6 85881
 
7.9%
9 74792
 
6.9%
0 70228
 
6.4%
Other Punctuation
ValueCountFrequency (%)
/ 135
42.1%
. 75
23.4%
& 54
 
16.8%
' 51
 
15.9%
? 3
 
0.9%
, 3
 
0.9%
Space Separator
ValueCountFrequency (%)
14147683
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 28
100.0%
Open Punctuation
ValueCountFrequency (%)
( 3
100.0%
Close Punctuation
ValueCountFrequency (%)
) 3
100.0%
Control
ValueCountFrequency (%)
 2
100.0%
Math Symbol
ValueCountFrequency (%)
+ 2
100.0%
Other Symbol
ValueCountFrequency (%)
� 1
100.0%
Modifier Symbol
ValueCountFrequency (%)
` 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 15238921
51.4%
Latin 14426090
48.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
E 2997641
20.8%
T 1482217
10.3%
A 1450048
10.1%
R 1171483
 
8.1%
N 1098144
 
7.6%
S 1008745
 
7.0%
U 793881
 
5.5%
V 724004
 
5.0%
O 591444
 
4.1%
L 447421
 
3.1%
Other values (42) 2661062
18.4%
Common
ValueCountFrequency (%)
14147683
92.8%
1 242009
 
1.6%
2 128448
 
0.8%
3 119975
 
0.8%
4 98442
 
0.6%
5 98114
 
0.6%
8 86578
 
0.6%
7 86410
 
0.6%
6 85881
 
0.6%
9 74792
 
0.5%
Other values (14) 70589
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 29665010
> 99.9%
Specials 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
14147683
47.7%
E 2997641
 
10.1%
T 1482217
 
5.0%
A 1450048
 
4.9%
R 1171483
 
3.9%
N 1098144
 
3.7%
S 1008745
 
3.4%
U 793881
 
2.7%
V 724004
 
2.4%
O 591444
 
2.0%
Other values (65) 4199720
 
14.2%
Specials
ValueCountFrequency (%)
� 1
100.0%

OFF STREET NAME
Text

MISSING 

Distinct235654
Distinct (%)64.8%
Missing1765645
Missing (%)82.9%
Memory size16.2 MiB
2024-10-29T15:05:03.048870image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length40
Median length40
Mean length35.477162
Min length8

Characters and Unicode

Total characters12904321
Distinct characters84
Distinct categories12 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique183367 ?
Unique (%)50.4%

Sample

1st row1211 LORING AVENUE
2nd row344 BAYCHESTER AVENUE
3rd row2047 PITKIN AVENUE
4th row480 DEAN STREET
5th row878 FLATBUSH AVENUE
ValueCountFrequency (%)
avenue 144192
 
11.9%
street 132007
 
10.9%
east 34796
 
2.9%
west 25157
 
2.1%
boulevard 22974
 
1.9%
road 17125
 
1.4%
lot 7881
 
0.7%
parking 7267
 
0.6%
parkway 7259
 
0.6%
place 7116
 
0.6%
Other values (27829) 802957
66.4%
2024-10-29T15:05:03.305696image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
6983095
54.1%
E 833178
 
6.5%
T 456311
 
3.5%
A 425986
 
3.3%
R 354308
 
2.7%
N 311113
 
2.4%
S 299102
 
2.3%
1 289612
 
2.2%
U 211823
 
1.6%
V 197588
 
1.5%
Other values (74) 2542205
 
19.7%

Most occurring categories

ValueCountFrequency (%)
Space Separator 6983095
54.1%
Uppercase Letter 4281228
33.2%
Decimal Number 1514856
 
11.7%
Dash Punctuation 85527
 
0.7%
Lowercase Letter 25390
 
0.2%
Other Punctuation 9588
 
0.1%
Open Punctuation 2311
 
< 0.1%
Close Punctuation 2300
 
< 0.1%
Modifier Symbol 20
 
< 0.1%
Connector Punctuation 3
 
< 0.1%
Other values (2) 3
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
E 833178
19.5%
T 456311
10.7%
A 425986
10.0%
R 354308
8.3%
N 311113
 
7.3%
S 299102
 
7.0%
U 211823
 
4.9%
V 197588
 
4.6%
O 196220
 
4.6%
L 147675
 
3.4%
Other values (16) 847924
19.8%
Lowercase Letter
ValueCountFrequency (%)
e 4224
16.6%
t 2945
11.6%
r 2395
9.4%
a 2247
 
8.8%
n 1669
 
6.6%
s 1646
 
6.5%
o 1348
 
5.3%
v 1080
 
4.3%
d 1029
 
4.1%
l 1022
 
4.0%
Other values (16) 5785
22.8%
Other Punctuation
ValueCountFrequency (%)
/ 6436
67.1%
& 1741
 
18.2%
. 1002
 
10.5%
@ 145
 
1.5%
, 83
 
0.9%
: 60
 
0.6%
# 54
 
0.6%
' 50
 
0.5%
* 8
 
0.1%
? 4
 
< 0.1%
Other values (2) 5
 
0.1%
Decimal Number
ValueCountFrequency (%)
1 289612
19.1%
2 196670
13.0%
0 170457
11.3%
3 154465
10.2%
5 153006
10.1%
4 135540
8.9%
6 110553
 
7.3%
7 107864
 
7.1%
8 102146
 
6.7%
9 94543
 
6.2%
Close Punctuation
ValueCountFrequency (%)
) 2299
> 99.9%
] 1
 
< 0.1%
Control
ValueCountFrequency (%)
1
50.0%
 1
50.0%
Space Separator
ValueCountFrequency (%)
6983095
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 85527
100.0%
Open Punctuation
ValueCountFrequency (%)
( 2311
100.0%
Modifier Symbol
ValueCountFrequency (%)
` 20
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 3
100.0%
Math Symbol
ValueCountFrequency (%)
= 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 8597703
66.6%
Latin 4306618
33.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
E 833178
19.3%
T 456311
10.6%
A 425986
9.9%
R 354308
8.2%
N 311113
 
7.2%
S 299102
 
6.9%
U 211823
 
4.9%
V 197588
 
4.6%
O 196220
 
4.6%
L 147675
 
3.4%
Other values (42) 873314
20.3%
Common
ValueCountFrequency (%)
6983095
81.2%
1 289612
 
3.4%
2 196670
 
2.3%
0 170457
 
2.0%
3 154465
 
1.8%
5 153006
 
1.8%
4 135540
 
1.6%
6 110553
 
1.3%
7 107864
 
1.3%
8 102146
 
1.2%
Other values (22) 194295
 
2.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 12904321
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
6983095
54.1%
E 833178
 
6.5%
T 456311
 
3.5%
A 425986
 
3.3%
R 354308
 
2.7%
N 311113
 
2.4%
S 299102
 
2.3%
1 289612
 
2.2%
U 211823
 
1.6%
V 197588
 
1.5%
Other values (74) 2542205
 
19.7%

NUMBER OF PERSONS INJURED
Real number (ℝ)

ZEROS 

Distinct32
Distinct (%)< 0.1%
Missing18
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean0.31729959
Minimum0
Maximum43
Zeros1631233
Zeros (%)76.6%
Negative0
Negative (%)0.0%
Memory size16.2 MiB
2024-10-29T15:05:03.391076image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile2
Maximum43
Range43
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.70657021
Coefficient of variation (CV)2.2268236
Kurtosis49.109714
Mean0.31729959
Median Absolute Deviation (MAD)0
Skewness4.1828763
Sum675646
Variance0.49924146
MonotonicityNot monotonic
2024-10-29T15:05:03.451371image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=32)
ValueCountFrequency (%)
0 1631233
76.6%
1 386459
 
18.1%
2 72889
 
3.4%
3 23855
 
1.1%
4 8854
 
0.4%
5 3388
 
0.2%
6 1421
 
0.1%
7 596
 
< 0.1%
8 267
 
< 0.1%
9 135
 
< 0.1%
Other values (22) 266
 
< 0.1%
ValueCountFrequency (%)
0 1631233
76.6%
1 386459
 
18.1%
2 72889
 
3.4%
3 23855
 
1.1%
4 8854
 
0.4%
5 3388
 
0.2%
6 1421
 
0.1%
7 596
 
< 0.1%
8 267
 
< 0.1%
9 135
 
< 0.1%
ValueCountFrequency (%)
43 1
 
< 0.1%
40 1
 
< 0.1%
34 1
 
< 0.1%
32 1
 
< 0.1%
31 1
 
< 0.1%
27 1
 
< 0.1%
25 1
 
< 0.1%
24 3
< 0.1%
23 1
 
< 0.1%
22 3
< 0.1%

NUMBER OF PERSONS KILLED
Real number (ℝ)

SKEWED  ZEROS 

Distinct7
Distinct (%)< 0.1%
Missing31
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean0.0015333318
Minimum0
Maximum8
Zeros2126219
Zeros (%)99.9%
Negative0
Negative (%)0.0%
Memory size16.2 MiB
2024-10-29T15:05:03.503129image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum8
Range8
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.041356728
Coefficient of variation (CV)26.971807
Kurtosis1870.1495
Mean0.0015333318
Median Absolute Deviation (MAD)0
Skewness33.31704
Sum3265
Variance0.0017103789
MonotonicityNot monotonic
2024-10-29T15:05:03.557447image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0 2126219
99.9%
1 3029
 
0.1%
2 83
 
< 0.1%
3 12
 
< 0.1%
4 4
 
< 0.1%
5 2
 
< 0.1%
8 1
 
< 0.1%
(Missing) 31
 
< 0.1%
ValueCountFrequency (%)
0 2126219
99.9%
1 3029
 
0.1%
2 83
 
< 0.1%
3 12
 
< 0.1%
4 4
 
< 0.1%
5 2
 
< 0.1%
8 1
 
< 0.1%
ValueCountFrequency (%)
8 1
 
< 0.1%
5 2
 
< 0.1%
4 4
 
< 0.1%
3 12
 
< 0.1%
2 83
 
< 0.1%
1 3029
 
0.1%
0 2126219
99.9%

NUMBER OF PEDESTRIANS INJURED
Real number (ℝ)

ZEROS 

Distinct14
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.057489947
Minimum0
Maximum27
Zeros2012019
Zeros (%)94.5%
Negative0
Negative (%)0.0%
Memory size16.2 MiB
2024-10-29T15:05:03.616306image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile1
Maximum27
Range27
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.24591278
Coefficient of variation (CV)4.2774918
Kurtosis123.1372
Mean0.057489947
Median Absolute Deviation (MAD)0
Skewness5.5944946
Sum122418
Variance0.060473093
MonotonicityNot monotonic
2024-10-29T15:05:03.671138image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=14)
ValueCountFrequency (%)
0 2012019
94.5%
1 113048
 
5.3%
2 3823
 
0.2%
3 379
 
< 0.1%
4 62
 
< 0.1%
5 26
 
< 0.1%
6 11
 
< 0.1%
7 5
 
< 0.1%
9 2
 
< 0.1%
8 2
 
< 0.1%
Other values (4) 4
 
< 0.1%
ValueCountFrequency (%)
0 2012019
94.5%
1 113048
 
5.3%
2 3823
 
0.2%
3 379
 
< 0.1%
4 62
 
< 0.1%
5 26
 
< 0.1%
6 11
 
< 0.1%
7 5
 
< 0.1%
8 2
 
< 0.1%
9 2
 
< 0.1%
ValueCountFrequency (%)
27 1
 
< 0.1%
19 1
 
< 0.1%
15 1
 
< 0.1%
13 1
 
< 0.1%
9 2
 
< 0.1%
8 2
 
< 0.1%
7 5
 
< 0.1%
6 11
 
< 0.1%
5 26
< 0.1%
4 62
< 0.1%

NUMBER OF PEDESTRIANS KILLED
Categorical

IMBALANCE 

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size16.2 MiB
0
2127794 
1
 
1572
2
 
13
6
 
1
4
 
1

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters2129381
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 2127794
99.9%
1 1572
 
0.1%
2 13
 
< 0.1%
6 1
 
< 0.1%
4 1
 
< 0.1%

Length

2024-10-29T15:05:03.727686image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-10-29T15:05:03.780834image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 2127794
99.9%
1 1572
 
0.1%
2 13
 
< 0.1%
6 1
 
< 0.1%
4 1
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
0 2127794
99.9%
1 1572
 
0.1%
2 13
 
< 0.1%
6 1
 
< 0.1%
4 1
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 2129381
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 2127794
99.9%
1 1572
 
0.1%
2 13
 
< 0.1%
6 1
 
< 0.1%
4 1
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Common 2129381
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 2127794
99.9%
1 1572
 
0.1%
2 13
 
< 0.1%
6 1
 
< 0.1%
4 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2129381
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 2127794
99.9%
1 1572
 
0.1%
2 13
 
< 0.1%
6 1
 
< 0.1%
4 1
 
< 0.1%

NUMBER OF CYCLIST INJURED
Categorical

IMBALANCE 

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size16.2 MiB
0
2071002 
1
 
57689
2
 
665
3
 
24
4
 
1

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters2129381
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 2071002
97.3%
1 57689
 
2.7%
2 665
 
< 0.1%
3 24
 
< 0.1%
4 1
 
< 0.1%

Length

2024-10-29T15:05:03.955469image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-10-29T15:05:04.008684image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 2071002
97.3%
1 57689
 
2.7%
2 665
 
< 0.1%
3 24
 
< 0.1%
4 1
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
0 2071002
97.3%
1 57689
 
2.7%
2 665
 
< 0.1%
3 24
 
< 0.1%
4 1
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 2129381
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 2071002
97.3%
1 57689
 
2.7%
2 665
 
< 0.1%
3 24
 
< 0.1%
4 1
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Common 2129381
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 2071002
97.3%
1 57689
 
2.7%
2 665
 
< 0.1%
3 24
 
< 0.1%
4 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2129381
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 2071002
97.3%
1 57689
 
2.7%
2 665
 
< 0.1%
3 24
 
< 0.1%
4 1
 
< 0.1%

NUMBER OF CYCLIST KILLED
Categorical

IMBALANCE 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size16.2 MiB
0
2129126 
1
 
254
2
 
1

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters2129381
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 2129126
> 99.9%
1 254
 
< 0.1%
2 1
 
< 0.1%

Length

2024-10-29T15:05:04.065625image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-10-29T15:05:04.118074image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 2129126
> 99.9%
1 254
 
< 0.1%
2 1
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
0 2129126
> 99.9%
1 254
 
< 0.1%
2 1
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 2129381
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 2129126
> 99.9%
1 254
 
< 0.1%
2 1
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Common 2129381
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 2129126
> 99.9%
1 254
 
< 0.1%
2 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2129381
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 2129126
> 99.9%
1 254
 
< 0.1%
2 1
 
< 0.1%

NUMBER OF MOTORIST INJURED
Real number (ℝ)

ZEROS 

Distinct31
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.22794887
Minimum0
Maximum43
Zeros1812117
Zeros (%)85.1%
Negative0
Negative (%)0.0%
Memory size16.2 MiB
2024-10-29T15:05:04.173830image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile1
Maximum43
Range43
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.66782754
Coefficient of variation (CV)2.9297251
Kurtosis60.990863
Mean0.22794887
Median Absolute Deviation (MAD)0
Skewness5.039144
Sum485390
Variance0.44599362
MonotonicityNot monotonic
2024-10-29T15:05:04.232419image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=31)
ValueCountFrequency (%)
0 1812117
85.1%
1 213175
 
10.0%
2 66369
 
3.1%
3 23126
 
1.1%
4 8672
 
0.4%
5 3336
 
0.2%
6 1373
 
0.1%
7 570
 
< 0.1%
8 258
 
< 0.1%
9 130
 
< 0.1%
Other values (21) 255
 
< 0.1%
ValueCountFrequency (%)
0 1812117
85.1%
1 213175
 
10.0%
2 66369
 
3.1%
3 23126
 
1.1%
4 8672
 
0.4%
5 3336
 
0.2%
6 1373
 
0.1%
7 570
 
< 0.1%
8 258
 
< 0.1%
9 130
 
< 0.1%
ValueCountFrequency (%)
43 1
 
< 0.1%
40 1
 
< 0.1%
34 1
 
< 0.1%
31 1
 
< 0.1%
30 1
 
< 0.1%
25 1
 
< 0.1%
24 3
< 0.1%
23 1
 
< 0.1%
22 2
< 0.1%
21 1
 
< 0.1%

NUMBER OF MOTORIST KILLED
Real number (ℝ)

SKEWED  ZEROS 

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.00063445668
Minimum0
Maximum5
Zeros2128134
Zeros (%)99.9%
Negative0
Negative (%)0.0%
Memory size16.2 MiB
2024-10-29T15:05:04.284616image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum5
Range5
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.027566514
Coefficient of variation (CV)43.449008
Kurtosis4021.6825
Mean0.00063445668
Median Absolute Deviation (MAD)0
Skewness53.62265
Sum1351
Variance0.00075991267
MonotonicityNot monotonic
2024-10-29T15:05:04.335454image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
0 2128134
99.9%
1 1165
 
0.1%
2 66
 
< 0.1%
3 12
 
< 0.1%
4 2
 
< 0.1%
5 2
 
< 0.1%
ValueCountFrequency (%)
0 2128134
99.9%
1 1165
 
0.1%
2 66
 
< 0.1%
3 12
 
< 0.1%
4 2
 
< 0.1%
5 2
 
< 0.1%
ValueCountFrequency (%)
5 2
 
< 0.1%
4 2
 
< 0.1%
3 12
 
< 0.1%
2 66
 
< 0.1%
1 1165
 
0.1%
0 2128134
99.9%
Distinct61
Distinct (%)< 0.1%
Missing7158
Missing (%)0.3%
Memory size16.2 MiB
2024-10-29T15:05:04.422776image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length53
Median length43
Mean length19.550142
Min length1

Characters and Unicode

Total characters41489762
Distinct characters55
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowAggressive Driving/Road Rage
2nd rowPavement Slippery
3rd rowFollowing Too Closely
4th rowUnspecified
5th rowUnspecified
ValueCountFrequency (%)
unspecified 720086
17.0%
driver 462166
 
10.9%
inattention/distraction 428425
 
10.1%
too 167717
 
4.0%
closely 167717
 
4.0%
to 152236
 
3.6%
failure 133171
 
3.1%
yield 126802
 
3.0%
right-of-way 126802
 
3.0%
following 114279
 
2.7%
Other values (96) 1639674
38.7%
2024-10-29T15:05:04.605970image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
i 4661397
 
11.2%
e 4223008
 
10.2%
n 3607901
 
8.7%
t 2883283
 
6.9%
o 2452467
 
5.9%
r 2443759
 
5.9%
s 2152694
 
5.2%
2116852
 
5.1%
a 2050152
 
4.9%
c 1593290
 
3.8%
Other values (45) 13304959
32.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 33875696
81.6%
Uppercase Letter 4695000
 
11.3%
Space Separator 2116852
 
5.1%
Other Punctuation 542084
 
1.3%
Dash Punctuation 255400
 
0.6%
Open Punctuation 2259
 
< 0.1%
Close Punctuation 2259
 
< 0.1%
Decimal Number 212
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 4661397
13.8%
e 4223008
12.5%
n 3607901
10.7%
t 2883283
8.5%
o 2452467
 
7.2%
r 2443759
 
7.2%
s 2152694
 
6.4%
a 2050152
 
6.1%
c 1593290
 
4.7%
l 1285072
 
3.8%
Other values (15) 6522673
19.3%
Uppercase Letter
ValueCountFrequency (%)
D 1039420
22.1%
U 953767
20.3%
I 608663
13.0%
F 303464
 
6.5%
C 293754
 
6.3%
T 262703
 
5.6%
P 190806
 
4.1%
R 174287
 
3.7%
L 138029
 
2.9%
W 127956
 
2.7%
Other values (12) 602151
12.8%
Decimal Number
ValueCountFrequency (%)
8 101
47.6%
0 101
47.6%
1 10
 
4.7%
Space Separator
ValueCountFrequency (%)
2116852
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 542084
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 255400
100.0%
Open Punctuation
ValueCountFrequency (%)
( 2259
100.0%
Close Punctuation
ValueCountFrequency (%)
) 2259
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 38570696
93.0%
Common 2919066
 
7.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 4661397
12.1%
e 4223008
 
10.9%
n 3607901
 
9.4%
t 2883283
 
7.5%
o 2452467
 
6.4%
r 2443759
 
6.3%
s 2152694
 
5.6%
a 2050152
 
5.3%
c 1593290
 
4.1%
l 1285072
 
3.3%
Other values (37) 11217673
29.1%
Common
ValueCountFrequency (%)
2116852
72.5%
/ 542084
 
18.6%
- 255400
 
8.7%
( 2259
 
0.1%
) 2259
 
0.1%
8 101
 
< 0.1%
0 101
 
< 0.1%
1 10
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 41489762
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i 4661397
 
11.2%
e 4223008
 
10.2%
n 3607901
 
8.7%
t 2883283
 
6.9%
o 2452467
 
5.9%
r 2443759
 
5.9%
s 2152694
 
5.2%
2116852
 
5.1%
a 2050152
 
4.9%
c 1593290
 
3.8%
Other values (45) 13304959
32.1%
Distinct61
Distinct (%)< 0.1%
Missing333967
Missing (%)15.7%
Memory size16.2 MiB
2024-10-29T15:05:04.706228image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length53
Median length11
Mean length13.053057
Min length1

Characters and Unicode

Total characters23435642
Distinct characters55
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowUnspecified
2nd rowUnspecified
3rd rowUnspecified
4th rowUnspecified
5th rowUnspecified
ValueCountFrequency (%)
unspecified 1511559
68.6%
driver 103518
 
4.7%
inattention/distraction 96634
 
4.4%
other 33837
 
1.5%
vehicular 32771
 
1.5%
too 28642
 
1.3%
closely 28642
 
1.3%
passing 22176
 
1.0%
to 21973
 
1.0%
lane 20656
 
0.9%
Other values (96) 302850
 
13.7%
2024-10-29T15:05:04.871977image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
i 3693563
15.8%
e 3595852
15.3%
n 2100622
9.0%
s 1800058
7.7%
c 1705895
7.3%
d 1587123
6.8%
p 1583029
6.8%
f 1569350
6.7%
U 1549134
6.6%
t 634425
 
2.7%
Other values (45) 3616591
15.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 20561405
87.7%
Uppercase Letter 2308020
 
9.8%
Space Separator 407844
 
1.7%
Other Punctuation 122141
 
0.5%
Dash Punctuation 35593
 
0.2%
Open Punctuation 295
 
< 0.1%
Close Punctuation 295
 
< 0.1%
Decimal Number 49
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 3693563
18.0%
e 3595852
17.5%
n 2100622
10.2%
s 1800058
8.8%
c 1705895
8.3%
d 1587123
7.7%
p 1583029
7.7%
f 1569350
7.6%
t 634425
 
3.1%
r 554178
 
2.7%
Other values (15) 1737310
8.4%
Uppercase Letter
ValueCountFrequency (%)
U 1549134
67.1%
D 229701
 
10.0%
I 129615
 
5.6%
C 54199
 
2.3%
F 49429
 
2.1%
T 45860
 
2.0%
O 45325
 
2.0%
V 42301
 
1.8%
P 38504
 
1.7%
L 29249
 
1.3%
Other values (12) 94703
 
4.1%
Decimal Number
ValueCountFrequency (%)
8 22
44.9%
0 22
44.9%
1 5
 
10.2%
Space Separator
ValueCountFrequency (%)
407844
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 122141
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 35593
100.0%
Open Punctuation
ValueCountFrequency (%)
( 295
100.0%
Close Punctuation
ValueCountFrequency (%)
) 295
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 22869425
97.6%
Common 566217
 
2.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 3693563
16.2%
e 3595852
15.7%
n 2100622
9.2%
s 1800058
7.9%
c 1705895
7.5%
d 1587123
6.9%
p 1583029
6.9%
f 1569350
6.9%
U 1549134
6.8%
t 634425
 
2.8%
Other values (37) 3050374
13.3%
Common
ValueCountFrequency (%)
407844
72.0%
/ 122141
 
21.6%
- 35593
 
6.3%
( 295
 
0.1%
) 295
 
0.1%
8 22
 
< 0.1%
0 22
 
< 0.1%
1 5
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 23435642
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i 3693563
15.8%
e 3595852
15.3%
n 2100622
9.0%
s 1800058
7.7%
c 1705895
7.3%
d 1587123
6.8%
p 1583029
6.8%
f 1569350
6.7%
U 1549134
6.6%
t 634425
 
2.7%
Other values (45) 3616591
15.4%
Distinct51
Distinct (%)< 0.1%
Missing1976273
Missing (%)92.8%
Memory size16.2 MiB
2024-10-29T15:05:04.974869image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length53
Median length11
Mean length11.658176
Min length1

Characters and Unicode

Total characters1784960
Distinct characters55
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4 ?
Unique (%)< 0.1%

Sample

1st rowUnspecified
2nd rowUnspecified
3rd rowUnspecified
4th rowUnspecified
5th rowUnspecified
ValueCountFrequency (%)
unspecified 142715
85.8%
other 2939
 
1.8%
vehicular 2899
 
1.7%
driver 2213
 
1.3%
too 2083
 
1.3%
closely 2083
 
1.3%
following 2026
 
1.2%
inattention/distraction 2025
 
1.2%
fatigued/drowsy 853
 
0.5%
pavement 416
 
0.3%
Other values (79) 6110
 
3.7%
2024-10-29T15:05:05.152517image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 305034
17.1%
i 303626
17.0%
n 156553
8.8%
s 149901
8.4%
c 149339
8.4%
d 144867
8.1%
p 144427
8.1%
f 143650
8.0%
U 143415
8.0%
o 17843
 
1.0%
Other values (45) 126305
7.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1598997
89.6%
Uppercase Letter 169178
 
9.5%
Space Separator 13254
 
0.7%
Other Punctuation 3185
 
0.2%
Dash Punctuation 313
 
< 0.1%
Open Punctuation 13
 
< 0.1%
Close Punctuation 13
 
< 0.1%
Decimal Number 7
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 305034
19.1%
i 303626
19.0%
n 156553
9.8%
s 149901
9.4%
c 149339
9.3%
d 144867
9.1%
p 144427
9.0%
f 143650
9.0%
o 17843
 
1.1%
t 16584
 
1.0%
Other values (15) 67173
 
4.2%
Uppercase Letter
ValueCountFrequency (%)
U 143415
84.8%
D 5705
 
3.4%
O 3277
 
1.9%
V 3197
 
1.9%
F 3125
 
1.8%
C 2579
 
1.5%
I 2566
 
1.5%
T 2343
 
1.4%
P 729
 
0.4%
S 582
 
0.3%
Other values (12) 1660
 
1.0%
Decimal Number
ValueCountFrequency (%)
8 3
42.9%
0 3
42.9%
1 1
 
14.3%
Space Separator
ValueCountFrequency (%)
13254
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 3185
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 313
100.0%
Open Punctuation
ValueCountFrequency (%)
( 13
100.0%
Close Punctuation
ValueCountFrequency (%)
) 13
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1768175
99.1%
Common 16785
 
0.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 305034
17.3%
i 303626
17.2%
n 156553
8.9%
s 149901
8.5%
c 149339
8.4%
d 144867
8.2%
p 144427
8.2%
f 143650
8.1%
U 143415
8.1%
o 17843
 
1.0%
Other values (37) 109520
 
6.2%
Common
ValueCountFrequency (%)
13254
79.0%
/ 3185
 
19.0%
- 313
 
1.9%
( 13
 
0.1%
) 13
 
0.1%
8 3
 
< 0.1%
0 3
 
< 0.1%
1 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1784960
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 305034
17.1%
i 303626
17.0%
n 156553
8.8%
s 149901
8.4%
c 149339
8.4%
d 144867
8.1%
p 144427
8.1%
f 143650
8.0%
U 143415
8.0%
o 17843
 
1.0%
Other values (45) 126305
7.1%

CONTRIBUTING FACTOR VEHICLE 4
Categorical

IMBALANCE  MISSING 

Distinct42
Distinct (%)0.1%
Missing2094603
Missing (%)98.4%
Memory size16.2 MiB
Unspecified
32801 
Other Vehicular
 
648
Following Too Closely
 
403
Driver Inattention/Distraction
 
289
Fatigued/Drowsy
 
170
Other values (37)
 
467

Length

Max length43
Median length11
Mean length11.492093
Min length5

Characters and Unicode

Total characters399672
Distinct characters51
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique7 ?
Unique (%)< 0.1%

Sample

1st rowUnspecified
2nd rowUnspecified
3rd rowUnspecified
4th rowUnspecified
5th rowUnspecified

Common Values

ValueCountFrequency (%)
Unspecified 32801
 
1.5%
Other Vehicular 648
 
< 0.1%
Following Too Closely 403
 
< 0.1%
Driver Inattention/Distraction 289
 
< 0.1%
Fatigued/Drowsy 170
 
< 0.1%
Pavement Slippery 119
 
< 0.1%
Reaction to Uninvolved Vehicle 43
 
< 0.1%
Unsafe Speed 34
 
< 0.1%
Outside Car Distraction 31
 
< 0.1%
Driver Inexperience 30
 
< 0.1%
Other values (32) 210
 
< 0.1%
(Missing) 2094603
98.4%

Length

2024-10-29T15:05:05.240966image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
unspecified 32801
88.1%
other 657
 
1.8%
vehicular 648
 
1.7%
too 408
 
1.1%
closely 408
 
1.1%
following 403
 
1.1%
driver 319
 
0.9%
inattention/distraction 289
 
0.8%
fatigued/drowsy 170
 
0.5%
pavement 122
 
0.3%
Other values (65) 1008
 
2.7%

Most occurring characters

ValueCountFrequency (%)
e 69337
17.3%
i 68692
17.2%
n 34973
8.8%
c 34024
8.5%
s 33992
8.5%
p 33176
8.3%
d 33164
8.3%
f 32931
8.2%
U 32913
8.2%
o 3186
 
0.8%
Other values (41) 23284
 
5.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 359010
89.8%
Uppercase Letter 37666
 
9.4%
Space Separator 2455
 
0.6%
Other Punctuation 499
 
0.1%
Dash Punctuation 34
 
< 0.1%
Open Punctuation 4
 
< 0.1%
Close Punctuation 4
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 69337
19.3%
i 68692
19.1%
n 34973
9.7%
c 34024
9.5%
s 33992
9.5%
p 33176
9.2%
d 33164
9.2%
f 32931
9.2%
o 3186
 
0.9%
r 2905
 
0.8%
Other values (15) 12630
 
3.5%
Uppercase Letter
ValueCountFrequency (%)
U 32913
87.4%
D 895
 
2.4%
O 714
 
1.9%
V 698
 
1.9%
F 618
 
1.6%
C 476
 
1.3%
T 439
 
1.2%
I 369
 
1.0%
S 154
 
0.4%
P 150
 
0.4%
Other values (11) 240
 
0.6%
Space Separator
ValueCountFrequency (%)
2455
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 499
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 34
100.0%
Open Punctuation
ValueCountFrequency (%)
( 4
100.0%
Close Punctuation
ValueCountFrequency (%)
) 4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 396676
99.3%
Common 2996
 
0.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 69337
17.5%
i 68692
17.3%
n 34973
8.8%
c 34024
8.6%
s 33992
8.6%
p 33176
8.4%
d 33164
8.4%
f 32931
8.3%
U 32913
8.3%
o 3186
 
0.8%
Other values (36) 20288
 
5.1%
Common
ValueCountFrequency (%)
2455
81.9%
/ 499
 
16.7%
- 34
 
1.1%
( 4
 
0.1%
) 4
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 399672
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 69337
17.3%
i 68692
17.2%
n 34973
8.8%
c 34024
8.5%
s 33992
8.5%
p 33176
8.3%
d 33164
8.3%
f 32931
8.2%
U 32913
8.2%
o 3186
 
0.8%
Other values (41) 23284
 
5.8%

CONTRIBUTING FACTOR VEHICLE 5
Categorical

IMBALANCE  MISSING 

Distinct31
Distinct (%)0.3%
Missing2119915
Missing (%)99.6%
Memory size16.2 MiB
Unspecified
8923 
Other Vehicular
 
191
Following Too Closely
 
104
Driver Inattention/Distraction
 
66
Pavement Slippery
 
50
Other values (26)
 
132

Length

Max length43
Median length11
Mean length11.466512
Min length5

Characters and Unicode

Total characters108542
Distinct characters50
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique12 ?
Unique (%)0.1%

Sample

1st rowUnspecified
2nd rowUnspecified
3rd rowUnspecified
4th rowUnspecified
5th rowUnspecified

Common Values

ValueCountFrequency (%)
Unspecified 8923
 
0.4%
Other Vehicular 191
 
< 0.1%
Following Too Closely 104
 
< 0.1%
Driver Inattention/Distraction 66
 
< 0.1%
Pavement Slippery 50
 
< 0.1%
Fatigued/Drowsy 41
 
< 0.1%
Reaction to Uninvolved Vehicle 12
 
< 0.1%
Alcohol Involvement 11
 
< 0.1%
Driver Inexperience 10
 
< 0.1%
Obstruction/Debris 10
 
< 0.1%
Other values (21) 48
 
< 0.1%
(Missing) 2119915
99.6%

Length

2024-10-29T15:05:05.311145image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
unspecified 8923
88.2%
other 193
 
1.9%
vehicular 191
 
1.9%
too 106
 
1.0%
closely 106
 
1.0%
following 104
 
1.0%
driver 76
 
0.8%
inattention/distraction 66
 
0.7%
pavement 51
 
0.5%
slippery 50
 
0.5%
Other values (48) 253
 
2.5%

Most occurring characters

ValueCountFrequency (%)
e 18900
17.4%
i 18646
17.2%
n 9466
8.7%
c 9259
8.5%
s 9204
8.5%
p 9051
8.3%
d 9008
8.3%
f 8950
8.2%
U 8946
8.2%
o 815
 
0.8%
Other values (40) 6297
 
5.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 97529
89.9%
Uppercase Letter 10224
 
9.4%
Space Separator 653
 
0.6%
Other Punctuation 121
 
0.1%
Dash Punctuation 11
 
< 0.1%
Open Punctuation 2
 
< 0.1%
Close Punctuation 2
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 18900
19.4%
i 18646
19.1%
n 9466
9.7%
c 9259
9.5%
s 9204
9.4%
p 9051
9.3%
d 9008
9.2%
f 8950
9.2%
o 815
 
0.8%
r 783
 
0.8%
Other values (15) 3447
 
3.5%
Uppercase Letter
ValueCountFrequency (%)
U 8946
87.5%
D 213
 
2.1%
O 211
 
2.1%
V 205
 
2.0%
F 157
 
1.5%
C 118
 
1.2%
T 112
 
1.1%
I 91
 
0.9%
S 60
 
0.6%
P 54
 
0.5%
Other values (10) 57
 
0.6%
Space Separator
ValueCountFrequency (%)
653
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 121
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 11
100.0%
Open Punctuation
ValueCountFrequency (%)
( 2
100.0%
Close Punctuation
ValueCountFrequency (%)
) 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 107753
99.3%
Common 789
 
0.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 18900
17.5%
i 18646
17.3%
n 9466
8.8%
c 9259
8.6%
s 9204
8.5%
p 9051
8.4%
d 9008
8.4%
f 8950
8.3%
U 8946
8.3%
o 815
 
0.8%
Other values (35) 5508
 
5.1%
Common
ValueCountFrequency (%)
653
82.8%
/ 121
 
15.3%
- 11
 
1.4%
( 2
 
0.3%
) 2
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 108542
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 18900
17.4%
i 18646
17.2%
n 9466
8.7%
c 9259
8.5%
s 9204
8.5%
p 9051
8.3%
d 9008
8.3%
f 8950
8.2%
U 8946
8.2%
o 815
 
0.8%
Other values (40) 6297
 
5.8%

COLLISION_ID
Real number (ℝ)

UNIQUE 

Distinct2129381
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3199644.3
Minimum22
Maximum4766163
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size16.2 MiB
2024-10-29T15:05:05.383707image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum22
5-th percentile107332
Q13168465
median3700950
Q34233531
95-th percentile4659478
Maximum4766163
Range4766141
Interquartile range (IQR)1065066

Descriptive statistics

Standard deviation1506545.7
Coefficient of variation (CV)0.47084786
Kurtosis0.039430244
Mean3199644.3
Median Absolute Deviation (MAD)532533
Skewness-1.2398214
Sum6.8132618 × 1012
Variance2.2696799 × 1012
MonotonicityNot monotonic
2024-10-29T15:05:05.456232image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
4455765 1
 
< 0.1%
3150659 1
 
< 0.1%
3147811 1
 
< 0.1%
3151505 1
 
< 0.1%
3155468 1
 
< 0.1%
3152392 1
 
< 0.1%
3146159 1
 
< 0.1%
3149007 1
 
< 0.1%
3144858 1
 
< 0.1%
3144240 1
 
< 0.1%
Other values (2129371) 2129371
> 99.9%
ValueCountFrequency (%)
22 1
< 0.1%
23 1
< 0.1%
24 1
< 0.1%
25 1
< 0.1%
26 1
< 0.1%
27 1
< 0.1%
28 1
< 0.1%
29 1
< 0.1%
30 1
< 0.1%
31 1
< 0.1%
ValueCountFrequency (%)
4766163 1
< 0.1%
4766160 1
< 0.1%
4766157 1
< 0.1%
4766156 1
< 0.1%
4766155 1
< 0.1%
4766154 1
< 0.1%
4766152 1
< 0.1%
4766151 1
< 0.1%
4766150 1
< 0.1%
4766148 1
< 0.1%
Distinct1721
Distinct (%)0.1%
Missing14523
Missing (%)0.7%
Memory size16.2 MiB
2024-10-29T15:05:05.534080image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length38
Median length35
Mean length16.864012
Min length1

Characters and Unicode

Total characters35664990
Distinct characters77
Distinct categories11 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1045 ?
Unique (%)< 0.1%

Sample

1st rowSedan
2nd rowSedan
3rd rowSedan
4th rowSedan
5th rowDump
ValueCountFrequency (%)
vehicle 898871
18.0%
utility 652404
13.1%
station 652360
13.1%
sedan 643361
12.9%
wagon/sport 472068
9.5%
passenger 416223
8.3%
181720
 
3.6%
wagon 180357
 
3.6%
sport 180291
 
3.6%
truck 88610
 
1.8%
Other values (996) 627949
12.6%
2024-10-29T15:05:05.703946image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2892576
 
8.1%
S 2797061
 
7.8%
t 2395826
 
6.7%
i 2017724
 
5.7%
E 1819881
 
5.1%
a 1684766
 
4.7%
e 1677834
 
4.7%
n 1610418
 
4.5%
o 1496619
 
4.2%
T 1146555
 
3.2%
Other values (67) 16125730
45.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 16261250
45.6%
Uppercase Letter 15676707
44.0%
Space Separator 2892576
 
8.1%
Other Punctuation 653848
 
1.8%
Decimal Number 71070
 
0.2%
Dash Punctuation 54304
 
0.2%
Open Punctuation 27618
 
0.1%
Close Punctuation 27613
 
0.1%
Modifier Symbol 2
 
< 0.1%
Other Symbol 1
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S 2797061
17.8%
E 1819881
11.6%
T 1146555
 
7.3%
I 1052195
 
6.7%
V 972412
 
6.2%
A 876030
 
5.6%
N 865518
 
5.5%
R 724075
 
4.6%
U 714664
 
4.6%
W 674102
 
4.3%
Other values (18) 4034214
25.7%
Lowercase Letter
ValueCountFrequency (%)
t 2395826
14.7%
i 2017724
12.4%
a 1684766
10.4%
e 1677834
10.3%
n 1610418
9.9%
o 1496619
9.2%
l 982650
6.0%
d 692481
 
4.3%
r 650217
 
4.0%
c 625858
 
3.8%
Other values (15) 2426857
14.9%
Decimal Number
ValueCountFrequency (%)
4 53423
75.2%
6 14404
 
20.3%
2 2682
 
3.8%
3 358
 
0.5%
1 72
 
0.1%
5 51
 
0.1%
0 43
 
0.1%
9 20
 
< 0.1%
8 10
 
< 0.1%
7 7
 
< 0.1%
Other Punctuation
ValueCountFrequency (%)
/ 653817
> 99.9%
. 16
 
< 0.1%
# 8
 
< 0.1%
, 3
 
< 0.1%
' 2
 
< 0.1%
? 1
 
< 0.1%
& 1
 
< 0.1%
Space Separator
ValueCountFrequency (%)
2892576
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 54304
100.0%
Open Punctuation
ValueCountFrequency (%)
( 27618
100.0%
Close Punctuation
ValueCountFrequency (%)
) 27613
100.0%
Modifier Symbol
ValueCountFrequency (%)
` 2
100.0%
Other Symbol
ValueCountFrequency (%)
� 1
100.0%
Control
ValueCountFrequency (%)
 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 31937955
89.5%
Common 3727033
 
10.5%
Cyrillic 2
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
S 2797061
 
8.8%
t 2395826
 
7.5%
i 2017724
 
6.3%
E 1819881
 
5.7%
a 1684766
 
5.3%
e 1677834
 
5.3%
n 1610418
 
5.0%
o 1496619
 
4.7%
T 1146555
 
3.6%
I 1052195
 
3.3%
Other values (41) 14239076
44.6%
Common
ValueCountFrequency (%)
2892576
77.6%
/ 653817
 
17.5%
- 54304
 
1.5%
4 53423
 
1.4%
( 27618
 
0.7%
) 27613
 
0.7%
6 14404
 
0.4%
2 2682
 
0.1%
3 358
 
< 0.1%
1 72
 
< 0.1%
Other values (14) 166
 
< 0.1%
Cyrillic
ValueCountFrequency (%)
Ð¥ 1
50.0%
Р 1
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 35664987
> 99.9%
Cyrillic 2
 
< 0.1%
Specials 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2892576
 
8.1%
S 2797061
 
7.8%
t 2395826
 
6.7%
i 2017724
 
5.7%
E 1819881
 
5.1%
a 1684766
 
4.7%
e 1677834
 
4.7%
n 1610418
 
4.5%
o 1496619
 
4.2%
T 1146555
 
3.2%
Other values (64) 16125727
45.2%
Cyrillic
ValueCountFrequency (%)
Ð¥ 1
50.0%
Р 1
50.0%
Specials
ValueCountFrequency (%)
� 1
100.0%

VEHICLE TYPE CODE 2
Text

MISSING 

Distinct1916
Distinct (%)0.1%
Missing414316
Missing (%)19.5%
Memory size16.2 MiB
2024-10-29T15:05:05.788084image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length38
Median length30
Mean length16.055187
Min length1

Characters and Unicode

Total characters27535689
Distinct characters73
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1141 ?
Unique (%)0.1%

Sample

1st rowSedan
2nd rowPick-up Truck
3rd rowSedan
4th rowTractor Truck Diesel
5th rowSedan
ValueCountFrequency (%)
vehicle 664573
17.0%
utility 477601
12.2%
station 477571
12.2%
sedan 449584
11.5%
wagon/sport 337367
8.6%
passenger 318613
8.2%
141583
 
3.6%
wagon 140260
 
3.6%
sport 140204
 
3.6%
truck 88068
 
2.3%
Other values (1046) 668544
17.1%
2024-10-29T15:05:05.944285image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2201871
 
8.0%
S 2067575
 
7.5%
t 1722402
 
6.3%
i 1480548
 
5.4%
E 1440160
 
5.2%
e 1233086
 
4.5%
a 1203904
 
4.4%
n 1143976
 
4.2%
o 1098417
 
4.0%
T 923581
 
3.4%
Other values (63) 13020169
47.3%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 12752696
46.3%
Lowercase Letter 11934735
43.3%
Space Separator 2201871
 
8.0%
Other Punctuation 479027
 
1.7%
Decimal Number 59230
 
0.2%
Dash Punctuation 54827
 
0.2%
Open Punctuation 26652
 
0.1%
Close Punctuation 26649
 
0.1%
Modifier Symbol 2
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S 2067575
16.2%
E 1440160
11.3%
T 923581
 
7.2%
N 869510
 
6.8%
I 842271
 
6.6%
V 732110
 
5.7%
A 685891
 
5.4%
U 596321
 
4.7%
O 588177
 
4.6%
R 578386
 
4.5%
Other values (16) 3428714
26.9%
Lowercase Letter
ValueCountFrequency (%)
t 1722402
14.4%
i 1480548
12.4%
e 1233086
10.3%
a 1203904
10.1%
n 1143976
9.6%
o 1098417
9.2%
l 708915
 
5.9%
r 505083
 
4.2%
d 489262
 
4.1%
c 485279
 
4.1%
Other values (15) 1863863
15.6%
Decimal Number
ValueCountFrequency (%)
4 43072
72.7%
6 13696
 
23.1%
2 1963
 
3.3%
3 323
 
0.5%
0 68
 
0.1%
1 58
 
0.1%
5 30
 
0.1%
9 8
 
< 0.1%
8 7
 
< 0.1%
7 5
 
< 0.1%
Other Punctuation
ValueCountFrequency (%)
/ 479003
> 99.9%
. 13
 
< 0.1%
' 3
 
< 0.1%
, 3
 
< 0.1%
? 2
 
< 0.1%
# 2
 
< 0.1%
& 1
 
< 0.1%
Space Separator
ValueCountFrequency (%)
2201871
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 54827
100.0%
Open Punctuation
ValueCountFrequency (%)
( 26652
100.0%
Close Punctuation
ValueCountFrequency (%)
) 26649
100.0%
Modifier Symbol
ValueCountFrequency (%)
` 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 24687431
89.7%
Common 2848258
 
10.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
S 2067575
 
8.4%
t 1722402
 
7.0%
i 1480548
 
6.0%
E 1440160
 
5.8%
e 1233086
 
5.0%
a 1203904
 
4.9%
n 1143976
 
4.6%
o 1098417
 
4.4%
T 923581
 
3.7%
N 869510
 
3.5%
Other values (41) 11504272
46.6%
Common
ValueCountFrequency (%)
2201871
77.3%
/ 479003
 
16.8%
- 54827
 
1.9%
4 43072
 
1.5%
( 26652
 
0.9%
) 26649
 
0.9%
6 13696
 
0.5%
2 1963
 
0.1%
3 323
 
< 0.1%
0 68
 
< 0.1%
Other values (12) 134
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 27535689
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2201871
 
8.0%
S 2067575
 
7.5%
t 1722402
 
6.3%
i 1480548
 
5.4%
E 1440160
 
5.2%
e 1233086
 
4.5%
a 1203904
 
4.4%
n 1143976
 
4.2%
o 1098417
 
4.0%
T 923581
 
3.4%
Other values (63) 13020169
47.3%

VEHICLE TYPE CODE 3
Text

MISSING 

Distinct274
Distinct (%)0.2%
Missing1982003
Missing (%)93.1%
Memory size16.2 MiB
2024-10-29T15:05:06.026767image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length35
Median length30
Mean length17.673445
Min length2

Characters and Unicode

Total characters2604677
Distinct characters62
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique162 ?
Unique (%)0.1%

Sample

1st rowSedan
2nd rowStation Wagon/Sport Utility Vehicle
3rd rowSedan
4th rowSedan
5th rowSedan
ValueCountFrequency (%)
vehicle 66051
18.5%
utility 51263
14.3%
station 51260
14.3%
sedan 49302
13.8%
wagon/sport 37901
10.6%
passenger 27716
7.8%
13442
 
3.8%
wagon 13359
 
3.7%
sport 13358
 
3.7%
truck 4550
 
1.3%
Other values (222) 29065
8.1%
2024-10-29T15:05:06.180463image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
210324
 
8.1%
S 206344
 
7.9%
t 190992
 
7.3%
i 157773
 
6.1%
a 128840
 
4.9%
e 128408
 
4.9%
n 126048
 
4.8%
o 116871
 
4.5%
E 116427
 
4.5%
T 77347
 
3.0%
Other values (52) 1145303
44.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1248798
47.9%
Uppercase Letter 1085579
41.7%
Space Separator 210324
 
8.1%
Other Punctuation 51345
 
2.0%
Decimal Number 3644
 
0.1%
Dash Punctuation 3235
 
0.1%
Open Punctuation 876
 
< 0.1%
Close Punctuation 876
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S 206344
19.0%
E 116427
10.7%
T 77347
 
7.1%
I 71418
 
6.6%
V 69177
 
6.4%
N 65724
 
6.1%
A 57944
 
5.3%
U 56315
 
5.2%
W 54655
 
5.0%
O 46604
 
4.3%
Other values (15) 263624
24.3%
Lowercase Letter
ValueCountFrequency (%)
t 190992
15.3%
i 157773
12.6%
a 128840
10.3%
e 128408
10.3%
n 126048
10.1%
o 116871
9.4%
l 77268
6.2%
d 52292
 
4.2%
r 46804
 
3.7%
c 45817
 
3.7%
Other values (14) 177685
14.2%
Decimal Number
ValueCountFrequency (%)
4 2999
82.3%
6 442
 
12.1%
2 185
 
5.1%
3 11
 
0.3%
1 3
 
0.1%
8 2
 
0.1%
5 1
 
< 0.1%
0 1
 
< 0.1%
Space Separator
ValueCountFrequency (%)
210324
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 51345
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 3235
100.0%
Open Punctuation
ValueCountFrequency (%)
( 876
100.0%
Close Punctuation
ValueCountFrequency (%)
) 876
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2334377
89.6%
Common 270300
 
10.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
S 206344
 
8.8%
t 190992
 
8.2%
i 157773
 
6.8%
a 128840
 
5.5%
e 128408
 
5.5%
n 126048
 
5.4%
o 116871
 
5.0%
E 116427
 
5.0%
T 77347
 
3.3%
l 77268
 
3.3%
Other values (39) 1008059
43.2%
Common
ValueCountFrequency (%)
210324
77.8%
/ 51345
 
19.0%
- 3235
 
1.2%
4 2999
 
1.1%
( 876
 
0.3%
) 876
 
0.3%
6 442
 
0.2%
2 185
 
0.1%
3 11
 
< 0.1%
1 3
 
< 0.1%
Other values (3) 4
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2604677
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
210324
 
8.1%
S 206344
 
7.9%
t 190992
 
7.3%
i 157773
 
6.1%
a 128840
 
4.9%
e 128408
 
4.9%
n 126048
 
4.8%
o 116871
 
4.5%
E 116427
 
4.5%
T 77347
 
3.0%
Other values (52) 1145303
44.0%

VEHICLE TYPE CODE 4
Text

MISSING 

Distinct107
Distinct (%)0.3%
Missing2095841
Missing (%)98.4%
Memory size16.2 MiB
2024-10-29T15:05:06.259247image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length35
Median length30
Mean length18.006619
Min length2

Characters and Unicode

Total characters603942
Distinct characters58
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique50 ?
Unique (%)0.1%

Sample

1st rowStation Wagon/Sport Utility Vehicle
2nd rowSedan
3rd rowStation Wagon/Sport Utility Vehicle
4th rowSedan
5th rowSedan
ValueCountFrequency (%)
vehicle 15446
18.9%
utility 12272
15.0%
station 12272
15.0%
sedan 11983
14.6%
wagon/sport 9420
11.5%
passenger 5970
 
7.3%
2860
 
3.5%
sport 2852
 
3.5%
wagon 2852
 
3.5%
truck 839
 
1.0%
Other values (107) 5144
 
6.3%
2024-10-29T15:05:06.407285image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
48426
 
8.0%
S 48100
 
8.0%
t 47325
 
7.8%
i 38832
 
6.4%
a 31508
 
5.2%
e 31301
 
5.2%
n 30977
 
5.1%
o 28759
 
4.8%
E 24673
 
4.1%
l 19086
 
3.2%
Other values (48) 254955
42.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 305159
50.5%
Uppercase Letter 236461
39.2%
Space Separator 48426
 
8.0%
Other Punctuation 12280
 
2.0%
Decimal Number 727
 
0.1%
Dash Punctuation 661
 
0.1%
Close Punctuation 114
 
< 0.1%
Open Punctuation 114
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S 48100
20.3%
E 24673
10.4%
T 16125
 
6.8%
V 15936
 
6.7%
I 15052
 
6.4%
N 13719
 
5.8%
U 13166
 
5.6%
W 12881
 
5.4%
A 12217
 
5.2%
O 9654
 
4.1%
Other values (14) 54938
23.2%
Lowercase Letter
ValueCountFrequency (%)
t 47325
15.5%
i 38832
12.7%
a 31508
10.3%
e 31301
10.3%
n 30977
10.2%
o 28759
9.4%
l 19086
6.3%
d 12626
 
4.1%
r 11113
 
3.6%
c 10910
 
3.6%
Other values (14) 42722
14.0%
Decimal Number
ValueCountFrequency (%)
4 624
85.8%
6 58
 
8.0%
2 42
 
5.8%
3 2
 
0.3%
5 1
 
0.1%
Space Separator
ValueCountFrequency (%)
48426
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 12280
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 661
100.0%
Close Punctuation
ValueCountFrequency (%)
) 114
100.0%
Open Punctuation
ValueCountFrequency (%)
( 114
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 541620
89.7%
Common 62322
 
10.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
S 48100
 
8.9%
t 47325
 
8.7%
i 38832
 
7.2%
a 31508
 
5.8%
e 31301
 
5.8%
n 30977
 
5.7%
o 28759
 
5.3%
E 24673
 
4.6%
l 19086
 
3.5%
T 16125
 
3.0%
Other values (38) 224934
41.5%
Common
ValueCountFrequency (%)
48426
77.7%
/ 12280
 
19.7%
- 661
 
1.1%
4 624
 
1.0%
) 114
 
0.2%
( 114
 
0.2%
6 58
 
0.1%
2 42
 
0.1%
3 2
 
< 0.1%
5 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 603942
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
48426
 
8.0%
S 48100
 
8.0%
t 47325
 
7.8%
i 38832
 
6.4%
a 31508
 
5.2%
e 31301
 
5.2%
n 30977
 
5.1%
o 28759
 
4.8%
E 24673
 
4.1%
l 19086
 
3.2%
Other values (48) 254955
42.2%

VEHICLE TYPE CODE 5
Text

MISSING 

Distinct73
Distinct (%)0.8%
Missing2120207
Missing (%)99.6%
Memory size16.2 MiB
2024-10-29T15:05:06.493916image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length35
Median length30
Mean length18.159363
Min length2

Characters and Unicode

Total characters166594
Distinct characters55
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique33 ?
Unique (%)0.4%

Sample

1st rowStation Wagon/Sport Utility Vehicle
2nd rowStation Wagon/Sport Utility Vehicle
3rd rowSedan
4th rowSedan
5th rowStation Wagon/Sport Utility Vehicle
ValueCountFrequency (%)
vehicle 4167
18.5%
utility 3473
15.4%
station 3473
15.4%
sedan 3386
15.0%
wagon/sport 2671
11.8%
passenger 1487
 
6.6%
804
 
3.6%
wagon 804
 
3.6%
sport 802
 
3.6%
truck 258
 
1.1%
Other values (72) 1230
 
5.5%
2024-10-29T15:05:06.652517image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
t 13432
 
8.1%
13391
 
8.0%
S 13224
 
7.9%
i 11016
 
6.6%
a 8916
 
5.4%
e 8867
 
5.3%
n 8790
 
5.3%
o 8180
 
4.9%
E 6130
 
3.7%
l 5416
 
3.3%
Other values (45) 69232
41.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 86555
52.0%
Uppercase Letter 62765
37.7%
Space Separator 13391
 
8.0%
Other Punctuation 3475
 
2.1%
Dash Punctuation 201
 
0.1%
Decimal Number 161
 
0.1%
Close Punctuation 23
 
< 0.1%
Open Punctuation 23
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t 13432
15.5%
i 11016
12.7%
a 8916
10.3%
e 8867
10.2%
n 8790
10.2%
o 8180
9.5%
l 5416
6.3%
d 3533
 
4.1%
c 3148
 
3.6%
r 3141
 
3.6%
Other values (13) 12116
14.0%
Uppercase Letter
ValueCountFrequency (%)
S 13224
21.1%
E 6130
9.8%
T 4526
 
7.2%
V 4281
 
6.8%
I 4009
 
6.4%
U 3645
 
5.8%
W 3573
 
5.7%
N 3429
 
5.5%
A 3213
 
5.1%
O 2625
 
4.2%
Other values (13) 14110
22.5%
Decimal Number
ValueCountFrequency (%)
4 133
82.6%
2 14
 
8.7%
6 13
 
8.1%
3 1
 
0.6%
Space Separator
ValueCountFrequency (%)
13391
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 3475
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 201
100.0%
Close Punctuation
ValueCountFrequency (%)
) 23
100.0%
Open Punctuation
ValueCountFrequency (%)
( 23
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 149320
89.6%
Common 17274
 
10.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
t 13432
 
9.0%
S 13224
 
8.9%
i 11016
 
7.4%
a 8916
 
6.0%
e 8867
 
5.9%
n 8790
 
5.9%
o 8180
 
5.5%
E 6130
 
4.1%
l 5416
 
3.6%
T 4526
 
3.0%
Other values (36) 60823
40.7%
Common
ValueCountFrequency (%)
13391
77.5%
/ 3475
 
20.1%
- 201
 
1.2%
4 133
 
0.8%
) 23
 
0.1%
( 23
 
0.1%
2 14
 
0.1%
6 13
 
0.1%
3 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 166594
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
t 13432
 
8.1%
13391
 
8.0%
S 13224
 
7.9%
i 11016
 
6.6%
a 8916
 
5.4%
e 8867
 
5.3%
n 8790
 
5.3%
o 8180
 
4.9%
E 6130
 
3.7%
l 5416
 
3.3%
Other values (45) 69232
41.6%

Interactions

2024-10-29T15:04:42.920413image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:04:32.496460image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:04:33.908912image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:04:35.491498image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:04:37.079088image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:04:38.616172image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:04:40.057842image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:04:41.503687image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:04:43.094320image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:04:32.722551image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:04:34.069454image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:04:35.667927image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:04:37.248419image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:04:38.791893image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:04:40.237991image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:04:41.675296image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:04:43.308939image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:04:32.884231image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:04:34.332458image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:04:35.871072image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:04:37.441652image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:04:38.999089image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:04:40.442163image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:04:41.878319image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:04:43.509150image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:04:33.040621image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:04:34.505455image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:04:36.062303image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:04:37.622140image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:04:39.193161image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:04:40.635441image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:04:42.071478image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:04:43.693347image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:04:33.217699image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:04:34.699589image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:04:36.276274image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:04:37.843155image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:04:39.364911image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:04:40.818107image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:04:42.245167image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:04:43.866699image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:04:33.391035image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:04:34.906388image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:04:36.493171image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:04:38.037968image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:04:39.538896image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:04:40.983841image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:04:42.412213image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:04:44.037425image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:04:33.559435image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:04:35.119187image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:04:36.699542image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:04:38.236009image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:04:39.704829image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:04:41.155546image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:04:42.571014image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:04:44.205395image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:04:33.737111image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:04:35.318857image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:04:36.904606image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:04:38.434841image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:04:39.877698image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:04:41.328086image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:04:42.737704image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Missing values

2024-10-29T15:04:45.034096image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-10-29T15:04:48.107515image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-10-29T15:04:58.453128image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

CRASH DATECRASH TIMEBOROUGHZIP CODELATITUDELONGITUDELOCATIONON STREET NAMECROSS STREET NAMEOFF STREET NAMENUMBER OF PERSONS INJUREDNUMBER OF PERSONS KILLEDNUMBER OF PEDESTRIANS INJUREDNUMBER OF PEDESTRIANS KILLEDNUMBER OF CYCLIST INJUREDNUMBER OF CYCLIST KILLEDNUMBER OF MOTORIST INJUREDNUMBER OF MOTORIST KILLEDCONTRIBUTING FACTOR VEHICLE 1CONTRIBUTING FACTOR VEHICLE 2CONTRIBUTING FACTOR VEHICLE 3CONTRIBUTING FACTOR VEHICLE 4CONTRIBUTING FACTOR VEHICLE 5COLLISION_IDVEHICLE TYPE CODE 1VEHICLE TYPE CODE 2VEHICLE TYPE CODE 3VEHICLE TYPE CODE 4VEHICLE TYPE CODE 5
009/11/20212:39NaNNaNNaNNaNNaNWHITESTONE EXPRESSWAY20 AVENUENaN2.00.0000020Aggressive Driving/Road RageUnspecifiedNaNNaNNaN4455765SedanSedanNaNNaNNaN
103/26/202211:45NaNNaNNaNNaNNaNQUEENSBORO BRIDGE UPPERNaNNaN1.00.0000010Pavement SlipperyNaNNaNNaNNaN4513547SedanNaNNaNNaNNaN
206/29/20226:55NaNNaNNaNNaNNaNTHROGS NECK BRIDGENaNNaN0.00.0000000Following Too CloselyUnspecifiedNaNNaNNaN4541903SedanPick-up TruckNaNNaNNaN
309/11/20219:35BROOKLYN11208.040.667202-73.866500(40.667202, -73.8665)NaNNaN1211 LORING AVENUE0.00.0000000UnspecifiedNaNNaNNaNNaN4456314SedanNaNNaNNaNNaN
412/14/20218:13BROOKLYN11233.040.683304-73.917274(40.683304, -73.917274)SARATOGA AVENUEDECATUR STREETNaN0.00.0000000NaNNaNNaNNaNNaN4486609NaNNaNNaNNaNNaN
504/14/202112:47NaNNaNNaNNaNNaNMAJOR DEEGAN EXPRESSWAY RAMPNaNNaN0.00.0000000UnspecifiedUnspecifiedNaNNaNNaN4407458DumpSedanNaNNaNNaN
612/14/202117:05NaNNaN40.709183-73.956825(40.709183, -73.956825)BROOKLYN QUEENS EXPRESSWAYNaNNaN0.00.0000000Passing Too CloselyUnspecifiedNaNNaNNaN4486555SedanTractor Truck DieselNaNNaNNaN
712/14/20218:17BRONX10475.040.868160-73.831480(40.86816, -73.83148)NaNNaN344 BAYCHESTER AVENUE2.00.0000020UnspecifiedUnspecifiedNaNNaNNaN4486660SedanSedanNaNNaNNaN
812/14/202121:10BROOKLYN11207.040.671720-73.897100(40.67172, -73.8971)NaNNaN2047 PITKIN AVENUE0.00.0000000Driver InexperienceUnspecifiedNaNNaNNaN4487074SedanNaNNaNNaNNaN
912/14/202114:58MANHATTAN10017.040.751440-73.973970(40.75144, -73.97397)3 AVENUEEAST 43 STREETNaN0.00.0000000Passing Too CloselyUnspecifiedNaNNaNNaN4486519SedanStation Wagon/Sport Utility VehicleNaNNaNNaN
CRASH DATECRASH TIMEBOROUGHZIP CODELATITUDELONGITUDELOCATIONON STREET NAMECROSS STREET NAMEOFF STREET NAMENUMBER OF PERSONS INJUREDNUMBER OF PERSONS KILLEDNUMBER OF PEDESTRIANS INJUREDNUMBER OF PEDESTRIANS KILLEDNUMBER OF CYCLIST INJUREDNUMBER OF CYCLIST KILLEDNUMBER OF MOTORIST INJUREDNUMBER OF MOTORIST KILLEDCONTRIBUTING FACTOR VEHICLE 1CONTRIBUTING FACTOR VEHICLE 2CONTRIBUTING FACTOR VEHICLE 3CONTRIBUTING FACTOR VEHICLE 4CONTRIBUTING FACTOR VEHICLE 5COLLISION_IDVEHICLE TYPE CODE 1VEHICLE TYPE CODE 2VEHICLE TYPE CODE 3VEHICLE TYPE CODE 4VEHICLE TYPE CODE 5
212937108/04/202419:27NaNNaN40.610508-74.09576(40.610508, -74.09576)STATEN ISLAND EXPRESSWAYNaNNaN1.00.0000010Following Too CloselyFollowing Too CloselyUnspecifiedNaNNaN4746578Station Wagon/Sport Utility VehicleStation Wagon/Sport Utility VehicleStation Wagon/Sport Utility VehicleNaNNaN
212937208/06/202416:25NaNNaNNaNNaNNaNWEST 16 STREETNaNNaN2.00.0000020UnspecifiedUnspecifiedNaNNaNNaN4746204SedanSedanNaNNaNNaN
212937308/06/20240:00NaNNaNNaNNaNNaNGRAND ARMY PLAZAFLATBUSH AVENUENaN0.00.0000000Driver Inattention/DistractionNaNNaNNaNNaN4745786Station Wagon/Sport Utility VehicleNaNNaNNaNNaN
212937408/05/202417:02NaNNaNNaNNaNNaNBAINBRIDGE AVENUENaNNaN1.00.0000010Aggressive Driving/Road RageUnspecifiedNaNNaNNaN4746338SedanStation Wagon/Sport Utility VehicleNaNNaNNaN
212937508/06/20249:00NaNNaN40.664960-73.82226(40.66496, -73.82226)BELT PARKWAYNaNNaN1.00.0000010Following Too CloselyUnspecifiedNaNNaNNaN4745999Station Wagon/Sport Utility VehicleStation Wagon/Sport Utility VehicleNaNNaNNaN
212937608/06/202420:39NaNNaNNaNNaNNaNCLEARVIEW EXPRESSWAY35 AVENUENaN0.00.0000000Traffic Control DisregardedUnspecifiedNaNNaNNaN4746046Station Wagon/Sport Utility VehicleSedanNaNNaNNaN
212937708/06/202412:00QUEENS11001.0NaNNaNNaNJAMAICA AVENUELITTLE NECK PARKWAYNaN0.00.0000000Following Too CloselyUnspecifiedNaNNaNNaN4746496SedanStation Wagon/Sport Utility VehicleNaNNaNNaN
212937807/29/20242:30NaNNaN40.723442-73.93899(40.723442, -73.93899)BROOKLYN QUEENS EXPRESSWAYNaNNaN0.00.0000000Passing or Lane Usage ImproperUnspecifiedNaNNaNNaN4746455Tractor Truck DieselSedanNaNNaNNaN
212937908/02/202415:48NaNNaNNaNNaNNaNTHROGS NECK BRIDGENaNNaN0.00.0000000Following Too CloselyUnspecifiedNaNNaNNaN4746469SedanPick-up TruckNaNNaNNaN
212938008/06/202412:46NaNNaNNaNNaNNaNEAST 140 STREETHARLEM RIVER DRIVENaN1.00.0000010UnspecifiedUnspecifiedUnspecifiedNaNNaN4746066Station Wagon/Sport Utility VehicleSedanStation Wagon/Sport Utility VehicleNaNNaN